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Abstract 

Biological systems are modular, and this modularity affects the evolution of bio¬ 
logical systems over time and in different environments. We here develop a theory for 
the dynamics of evolution in a rugged, modular fitness landscape. We show analyti¬ 
cally how horizontal gene transfer couples to the modularity in the system and leads 
to more rapid rates of evolution at short times. The model, in general, analytically 
demonstrates a selective pressure for the prevalence of modularity in biology. We use 
this model to show how the evolution of the influenza virus is affected by the modular¬ 
ity of the proteins that are recognized by the human immune system. Approximately 
25% of the observed rate of fitness increase of the virus could be ascribed to a modular 
viral landscape. 


1 Introduction 

Biological systems are modular, and the organization of their genetic material reflects this 
modularity [1HJ]. Complementary to this modularity is a set of evolutionary dynamics that 
evolves the genetic material of biological systems. In particular, horizontal gene transfer 
(HGT) is an important mechanism of evolution, in which genes, pieces of genes, or mul¬ 
tiple genes are transferred from one individual to another |5|-|7|. Additionally, multi-body 
contributions to the fitness function in biology are increasingly thought to be an important 
factor in evolution j$], leading to a rugged fitness landscape and glassy evolutionary dy¬ 
namics. The combination of modularity and horizontal gene transfer provide an effective 
mechanism for evolution upon a rugged fitness landscape [§(]. The organization of biology 
into modules simultaneously restricts the possibilities for function, because the modular 
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organization is a subset of all possible organizations, and may lead to more rapid evolu¬ 
tion, because the evolution occurs in a vastly restricted modular subspace of all possibilities 
[i9) 0] • Our results explicitly demonstrate this trade off, with t* serving as the crossover 
time from the latter to the former regime. 


Thus, the fitness function in biology is increasingly realized to be rugged, yet modular. 
Nonetheless, nearly all analytical theoretical treatments assume a smooth fitness landscape 
with a dependence only on Hamming distance from a most-fit sequence llj, a linear or 
multiplicative fitness landscape for dynamical analysis of horizontal gene transfer [12!, 13], or 
an uncorrelated random energy model 0,0. Horizontal gene transfer processes on more 
general, but still smooth, landscapes have been analyzed 00. Here we provide, to our 
knowledge, the first analytical treatment of a finite-population Markov model of evolution 
showing how horizontal gene transfer couples to modularity in the fitness landscape. We 
prove analytically that modularity can enhance the rate of evolution for rugged fitness 
landscapes in the presence of horizontal gene transfer. This foundational result in the 
physics of biological evolution offers a clue to why biology is so modular. We demonstrate 
this theory with an application to evolution of the influenza virus. 


We introduce and solve a model of individuals evolving on a modular, rugged fitness land¬ 
scape. The model is constructed to represent several fundamental aspects of biological 
evolution: a finite population, mutation and horizontal gene transfer, and a rugged fitness 
landscape. For this model, we will show that the evolved fitness is greater for a modular 
landscape than for a non-modular landscape. This result holds for t < t* where t* is a 
crossover time, larger than typical biological timescales. The dependence of the evolved 
fitness on modularity is multiplicative with the horizontal gene transfer rate, and the ad¬ 
vantage of modularity disappears when horizontal gene transfer is not allowed. Our results 
describe the response of the system to environmental change. In particular, we show that 
modularity allows the system to recover more rapidly from change, and fitness values at¬ 
tained during the evolved response to change increase with modularity for large or rapid 
environmental change. 


2 Theory of the Rate of Evolution in a Rugged Fitness Land¬ 
scape 

We use a Markov model to describe the evolutionary process. There are N individuals. 
Each individual a replicates at a rate f a . The average fitness in the population is defined 
as (/) = fa- Each individual has a sequence S a that is composed of L loci, sf. 

For simplicity, we take sf = ±1. Each of the loci can mutate to the opposite state with 
rate fi. Each sequence is composed of I\ modules of length l = L/K. A horizontal gene 
transfer process randomly replaces the fcth module in the sequence of individual cc, e.g. the 
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sequence loci s? fe _ 1 ^ +1 ... s^, with the corresponding sequences from a randomly chosen 
individual (3 at rate u. The a priori rate of sequence change in a population is, therefore, 
NfiL + NvL/2. Since the fitness landscape in biology is rugged, we use a spin glass to 
represent the fitness: 


f[S] = 2 L + H[S] 

H-is] = 'Y, sjSjjjjAjj (i) 

ij 


Jij is a quenched, Gaussian random matrix, with variance 1/C. As discussed in Appendix 
A, the offset value 2 L is chosen by Wigner’s semicircle law so that the minimum eigenvalue 
of / is non-negative. The entries in the matrix A are zero or one, with probability C/L per 
entry, so that the average number of connections per row is C. We introduce modularity 
by an excess of interactions in A along the l x l block diagonals of the L x L connection 
matrix. There are K of these block diagonals. Thus, the probability of a connection 
is Cq/L when [i/l\ / [j/l\ and C\/L when [i/l\ = [j/l\- The number of connections 
is C = Co + (C\ — Cq)/K. Modularity is defined by M = ( C\ — Cq)/(KC) and obeys 
-1/(K -1) < M < 1. 

The Markov process describing these evolutionary dynamics includes terms for replication 
(/), mutation (^r), and horizontal gene transfer (v)\ 
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Here n a is the number of individuals with sequence S a . with the vector index a used to 
label the 2 L sequences. This process conserves N = ^ a n a . The notation da means the 
L sequences created by a single mutation from sequence S a . The notation a/b/,. means 
the sequence created by horizontally gene transferring module k from sequence S' b into 
sequence S a . 

We consider how a population of initially random sequences adapts to a given environment, 
averaged over the distribution of potential environments. For example, in the context of 


,n b ;t) 
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influenza evolution, these sequences arise, essentially randomly, by transmission from swine. 
As discussed in Appendix B, a short-time expansion for the average fitness can be derived 
by recursive application of this master equation: 


</(*)> 

a 

b 


2 L -f- at -f- bt 

2L | 1 - — 

N 


4 L 2 

w 

-2 vL 


1 

1 “ N 
1 

1- 

K 


- 4 »L (l - 4) 

(1-«,(!-!) 
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Result Q is exact for all finite N. Note that the effect of modularity enters at the 
quadratic level and requires a non-zero rate of horizontal gene transfer, v > 0. We see 
that (/m> o(t)} > {fo(t)} for short times. 


From the master equation, we also calculate the sequence divergence, defined as D = 

As discussed in Appendix C, recursive application of Eq. (j2j) 
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(4) 


The sequence divergence does not depend on modularity at second order in time. Note the 
terms at order t n not proportional are due to discontinuous changes in sequence 

resulting from the fixed population size constraint. 

Introduction of a new sequence into an empty niche corresponds to an initial condition of 
identical, rather than random, sequences. The population dynamics again follows Eq. ©■ 
To see an effect of modularity, an expansion to 4th order in time is required. As discussed 
in Appendix B, we find 

(f(t)) = 2 L + bt 2 + ct 3 + dt 4 (5) 
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with 


(2 !)& = 16/zL(l — 1/N) 

(3!)c = —64/xL 2 (l — 1/N)/N — 192/?L(1 — 1/-/V) — 32/rzzL(l — 1/N)/N 

(4!)d = 128ju 2 iyLM(l - 1/K)(1 - l/N)(l - 4 /N) + 64//L(l - 1/1V)[2(6 + 14// 2 - fiv) 

+2/«/(l - 4 /N)/K - 2(36 - 10 pL - 9/au)/N + (4L 2 + 2L + 92 + 4i/L + z?)/fV 2 ] 

( 6 ) 

Interestingly, the fitness function initially increases quadratically. The modularity depen¬ 
dence enters at fourth order, and (/m>o(?) > (/o(t)) for short times. 


3 Comparison of Theory to Simulation Results 


The response function of the modular and non-modular system can be computed numeri¬ 
cally as well. We start the system off with random initial sequences, so that the average 
initial (/(0)) — 2 L is zero, and compute the evolution of the average fitness, ( f[t )}. In Fig. 
[T]we show the results from a Lebowitz-Gillespie simulation. We see that (/m(?) > (/o(?) 
for t < t* for M = 1. That is, the modular system evolves more quickly to improve 
the average fitness, for times less than a crossover time, t*. For t > t*, the constraint 
that modularity imposes on the connections leads to the non-modular system dominating, 

> {f M (t )}■ 

Eq. ([3]) shows these results analytically, and we have checked Eq. dH) by numerical sim¬ 
ulation as well. For the parameter values of Fig. |T| and M = 0, theory predicts a/L = 
0.019998, b/L = —0.011636, and simulation results give a/L = 0.020529 ± 0.000173, b/L = 
—0.011306 ±0.000406. For M = 1, theory predicts a/L = 0.019998, b/L = —0.002041, and 
simulation results give a/L — 0.019822 ± 0.000200, b/L = —0.0019084 ± 0.000177 

Since the landscape is rugged, we expect the time it takes the system to reach a given fitness 
value, ( f[t )), starting from random sequences to grow in a stretched exponential way at 
large times. Moreover, since the combination of horizontal gene transfer and modularity 
improves the response function at short times, we expect that the difference in times 
required to reach ( f(t )) of a non-modular and modular system also increases in a stretched 
exponential way, by analogy with statistical mechanics, in which we expect the spin glass 
energy response function to converge as 


(f(t)) ~ foo - cln 2/u t/t 0 


(7) 


with v = 1 [21]. Figured] shows the fit of the data to this functional form, with = 0.958, 
c = 0.067, to = 3.168 for M = 0 and foo = 0.922, c = 0.004, to = 7.619 for M = 1. Figured] 
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Figure 1: Shown is the population average fitness for the modular (M = 1, solid) and 
non-modular (M = 0, dashed) systems. Fitness is normalized by L and the offset 2 L is 
subtracted. The modular system evolves to a greater fitness value for times t < t*. Here 
t* « 182. Here Eq. dTJ) has been scaled by e = 0.1, and L = 100, N = 10000, fi = 0.05, 
v = 0.6, K = 5, and C = L/K — 1, motivated by application to evolution of influenza to 
be discussed below. In inset is shown the fit of the data to the form of Eq. ©. 


shows the stretched exponential speedup in the rate of evolution that modularity provides. 


The prediction for evolution of a population of identical sequences in a new niche is shown 
in Eq. ([6]). Analytically, the effect of modularity shows up at 4th order rather than 2nd 
order when all sequences are initially identical. Qualitatively, however, at all but the very 
shortest times, the results for random and identical sequences are similar, as shown in Figs. 

[T| and [3j 
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Figure 2: Shown is the difference in time required, At, for the modular (M = 1) and 
non-modular (M = 0) system to reach a specified average fitness value for times t < t *, 
starting from random initial sequences. Parameter values are as in Fig. [1] 

4 Average Fitness in a Changing Environment 

We show how to use these results to calculate the average fitness in a changing environment. 
We consider that every T time steps, the environment randomly changes. During such 
an environmental change, each Jij in the fitness, Eq. ([!]), is randomly redrawn from the 
Gaussian distribution with probability p. That is, on average, a fraction p of the fitness is 
randomized. Due to this randomization, the fitness immediately after the environmental 
change will be 1 — p times the value immediately before the change, on average. This 
condition allows us to calculate the average time-dependent fitness during evolution in one 
environment as a function of p and T, given only the average fitness starting from random 
initial conditions, ( f(t )) [22j]. We denote the average fitness reached during evolution in 
one environment as This is related to the average fitness with random initial 

conditions by / Pi t(M) = (), where t is chosen to satisfy (— T) = (1 — 
p)(f(M))(t) due to the above condition. Thus, for high rates or large magnitudes of 
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Figure 3: Shown is the population average fitness for the modular (M = 1, solid) and non- 
modular (M = 0, dashed) systems when all individuals initially have the same sequence. 
Fitness is normalized by L and the offset 2 L is subtracted. The modular system evolves to 
a greater fitness value for times t < t*. Here t* ~ 230. Parameter values are as in Fig. [1} 


environmental change, Eq. ([3]) can be directly used along with these two conditions to 
predict the steady-state average population fitness. 

Figured] shows how the average fitness depends on the rate and magnitude of environmental 
change. The average fitness values for a given modularity are computed numerically. The 
figure displays a crossing of the modularity-dependent fitness curves. The curve associated 
with the largest modularity is greatest at short times. There is a crossing to the curve 
associated with the second largest modularity at intermediate times. And there is a crossing 
to the curve associated with the smallest modularity at somewhat longer times. In other 
words, the value of modularity that leads to the highest average fitness depends on time, 
T, decreasing with increasing time. 

Figured] demonstrates that larger values of modularity, greater M, lead to higher average 
fitness values for faster rates of environmental change, smaller T. The advantage of greater 

















Figure 4: Shown is the fitness, f P: T{M), in an environment that changes with characteristic 
time T and magnitude a) p = 0.5 or b) p = 0.9. c) The fitness is replotted as a function 
of T/p. For each p, the fitness is shown for three values of modularity: M = 0.6 (solid), 
M = 0.8 (dotted), and M = 1 (dashed). Parameters are as in Fig. [D 


modularity persists to larger T for greater magnitudes of environmental change, larger 
p. In other words, modularity leads to greater average fitness values either for greater 
frequencies of environmental change, 1/T, or greater magnitudes of environmental change, 

p. 

It has been argued that an approximate measure of the environmental pressure is given 
by the product of the magnitude and frequency of environmental change p/T [3]. If this 
is the case, then the curves in Fig. |4jib as a function of T for different values of p should 
collapse to a single curve as a function of T/p. As shown in Fig. Hk, this is approximately 
true for the three cases M = 0.6, M = 0.8, and M = 1. 
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5 Application of Theory to Influenza Evolution Data 


We here apply our theory to the evolution of the influenza virus. Influenza is an RNA 
virus of the Orthomyxoviridae family. The 11 proteins of the virus are encoded by 8 gene 
segments. Reassortment of these gene segments is common 23H25I]. In addition, there is 
a constant source of genetic diveristy, as most human influenza viruses arise from birds, 
mix in pigs, with a select few transmitted to humans 23-26|. We model a simplification of 
this complex coevolutionary dynamics with the theory presented here. We consider I\ ~ 5 
modules, with mutations in the genetic material encoding them leading to an effective 
mutation rate at the coarse-grained length scale. We do not consider individual amino 
acids, but rather a coarse-grained L = 100 reprsentation the viral protein material. The 
virus is under strong selective pressure to adapt to the human host, having most often arisn 
in birds and trasmitted through swine. The virus is also under strong pressure from the 
human immune system. The virus evolves in response to this pressure, thereby increasing 
its fitness. The increase in fitness was estimated by tracking the increase in frequency 
of each viral strain, as observed in the public sequence databases: In [fi(t + l)//j(t)] = 
Xi(t + 1 )/xi(t), where fi(t) is the fitness of clade i at time t , Xi(t) is the frequency of clade 
i among all clades observed at time t, and Xi(t + 1) is the frequency at t + 1 predicted by a 
model that includes a description of the mutational processes [27]. Here “clade” is a term for 
the quasispecies of closely-related influenza sequences at time t. We use these approximate 
fitness values, estimated from observed HA sequence patterns and an approximate point 
mutation model of evolution, for comparison to the present model. 


Influenza evolves within a cluster of closely-related sequences for 3-5 years and then jumps 
to a new cluster 28-30]. Indeed, the fitness flux data over 17 years 271] shows a pattern of 
discontinuous changes e very 3-5 years. Often these jumps are related to influx of genetic 


material from swine 23426!]. By clustering the strains, the clusters and the transitions be¬ 


tween them can be identified: the flu strain evolved from Wuhan/359/95 to Sydney/5/97 
at 1996-1997, Panama/2007/1999 to Fujian/411/2002 at 2001-2002, California/7/2004 to 
Wisconsin/67/2005 at 2006-2007, Brisbane/10/2007 to BritishColumbia/RV1222/09 at 
2009-2010 [30]. These cluster transitions correspond to the discontinuous jumps in the 
fitness flux evolution and discontinuous changes in the sequences. Our theory represents 
the evolution within one of these clusters, considering reassortment only among human 
viruses. Thus, we predict the evolution of the fitness during each of these periods. There 
are 4 periods within the time frame 1993-2009. Figure [5] shows the measured fitness data 
averaged over these four time periods. 


We scaled Eq. (jT[) by e to fit the observed data. The predictions from Eq. dH) are shown in 
Fig. [5l We find e = 0.1 and M = 1 fit the observed data well. We assume the overall rate of 
evolution is equally contributed to by mutation and horizontal gene transfer. The average 
observed substitution rate in the lOOaa epitope region of the HA protein is 5 amino acids 
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Figure 5: Shown is the fitness of influenza virus H3N2 in humans, estimated from the 
time derivative of the frequency with which sequences appear in the GenBank sequence 
database. Each time a new circulating strain is introduced to humans, typically from pigs 
infected from birds, the fitness increases as a quasispecies expands around the strain. The 
average fitness increase over four such antigenic shifts between 1993 and 2010 is shown. 
Data are from [27]. Error bars are one standard error. Theoretical results are shown for 
the modular (M = 1, solid) and non-modular (M = 0, dotted) model predictions. Theory 
was fit to the average linear fitness increase at 5 years. Parameter values are as in Fig. |TJ 
with fitness scaled by e = 0.1. Theoretical results are also shown for e = 0.125 and M = 0 
(dashed). 


per year 



and v = 0.6. 


We interpret this result in our coarse-grained model to imply /i = .05 


The value of e required to fit the non-modular model to the data is 25% greater than the 
value required to fit the modular model to the data. That is, approximately 25% of the 
observed rate of fitness increase of the virus could be ascribed to a modular viral fitness 
landscape. Thus, to achieve the observed rate of evolution, the virus may either have 
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evolved modularity M = 1, or the virus may have evolved a 25% increase in its replication 
rate by other evolutionary mechanisms. Given the modular structure of the epitopes on 
the haemagglutinin protein of the virus, the modular nature of the viral segments, and the 
modular nature of naive, B cell, and T cell immune responses we suggest that influenza 
has likely evolved a modular fitness landscape. 


6 Generalization to Investigate the Effect of Landscape Rugged¬ 
ness 


We here generalize Eq. (JT|) to p-spin interactions, q letters in the alphabet from which 
the sequences are constructed, and what is known as the GNK random matrix form of 


311 ]. These generalizations allow us to investigate the effect of landscape 


interactions 
ruggedness on the rate of evolution. 


6.1 Generalization to the p-spin SK Model 

We first generalize the SK model to interactions among p spins, which we term the pSK 
model. The pSK landscape is more rugged for increasing p. Thus, we might expect mod¬ 
ularity to play a more important role for the models with larger p. The generalized fitness 
is written as 

f = ^ ] Jiii2...ip^iii2...i p Si 1 Si 2 ...Sip + 2L (8) 

i\i2...i v 

The J and A tensors are symmetric, that is, Jq; 2 ...j p = J[q i2 ...j p ] and A ili2 ... ip = A[ ili2 ... ip ], 
where [iii 2 ---ip] is any permutation of The probability of a connection is Ci/-L p_1 

inside the block, \i\/l\ = \} 2 /l\ = an d Go /L p ~ l outside the blocks. The 

connections are defined as 

E = C (9) 

i2 ...ip 

Since C = Co[l — (j^) p_1 ] + Gi(-^) p_1 , it follows that C\ = C[ATR' P_1 + (1 — M)\, and 
C 0 = C(1 - M). 

Increasing the value of p makes the landscape more rugged, and so it is more difficult for 
system to reach a given value of the fitness in a finite time. Thus, modularity is expected 
to play a more significant role in giving the system an evolutionary advantage. Specifically 
we expect that the effect of modularity will show up at shorter times for larger p. As shown 
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Figure 6: Results for 2-spin, 12-spin and 30-spin SK interaction fitness when initial se¬ 
quences are random. The fitness is given by Eq. (|8|) . The parameters are as in Fig. [I] 


in Appendix B, we find 
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( 10 ) 


The modular terms increase faster with p than the non-modular terms, so the dependence 
on modularity of evolution is more significant with bigger p. Figure [6] shows the average 
fitness curves for the p-spin interaction. Due to the normalization of the J values, all fitness 
curves have the same slope at t = 0, i.e. the same values of a. However, increasing p leads 
to slower rates of fitness increase at finite time, especially for the M = 0 case. At finite 
time, for large values of p, the evolutionary advantage of larger M is more pronounced. 
That is, modularity help the sequences to evolve more efficiently. 

We also calculated the average fitness for the initial condition of all sequences identical. 
As discussed above, many-spin effects make the landscape more rugged for larger p. We 
seek to understand this effect quantitatively for the case of same initial sequences as well. 
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As shown in Appendix B, the Taylor expansion results are 

(2 \)b = 8npL(l-l/N) 

(3!)c = -32ppL 2 (l-l/N)/N-16p 2 p 2 L{l-l/N) 

—6Ap, 2 pL(l — 1/N) — 16pupL(l — 1/N)/N 
(4 \)d = 64 n 2 vp[p - 1)LM(1 - 1/K)(1 - l/N)(l - 4 /N) 

+32ppL(l — 1/N)[(6p + 7 p 2 p 2 — 2/j u(p — 1)) 

+2pv{l — 4 /N)/K — (36 p — 10 ppL — 5 pvp — 8 pu(p — 1 ))/N 
+(4L 2 + 2L + 46 p + 4 uL + u 2 )/N 2 } 

( 11 ) 


6.2 Generalization to the Planar Potts Model 


We here generalize the alphabet from two-states, s* = ±1, to q states, with a planar Potts 
model 32(] of the fitness. The q = 2 model is often justified as a projection of the q = 4 
nucleic acid alphabet onto purines and pyrimidines. Thus, q = 4 is a relevant generalization. 
Additional q = 20 corresponds to the amino acid alphabet. The alphabet size can affect 
the evolutionary phase diagram (33|, so q is a relevant order parameter. In the planar Potts 
model, each vector spin takes on q equidistant angles, and the angle between two is defined 
by Si ■ Sj = cos dij. The fitness is 


/ — 'y ^ cos @ij + 2T (12) 

ij 

The directions of a spin are evenly spaced in the plane, so when the spin points to q 
directions, the angle between two spins is 2kn/q, where k can beO, 1, ..., q — 1. 

The evolutionary dynamics in the planar Potts model is distinct from that in the pSK 
model. As shown in Appendix B, 


a 

2b 


L(l-l/N)(l + 8 q , 2 ) 

—4L 2 (1 - 1/JV)(1 + 8 q , 2 )/N - 2pL(l - 1/N )—(1 + S q , 2 ) 


-2 uL 




q- 1 

i-4)(i + i,, 2J 


(13) 


Figure [7] displays results for two values of q. Increasing q does increase the landscape 
ruggedness in the planar Potts model. There is an evolutionary speedup provided by 
modularity. The q dependence, however, is not as dramatic as in the pSK model. 
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Figure 7: Average fitness in the planar Potts model for alphabet sizes of q = 2 or q = 20. 
Sequences are initially random. The fitness Eq. m- Other parameters are same as in Fig. 

m 

6.3 Generalization to the GNK Model 

A model related to the SK form is the GNK model, a simple form of which is 

/ = H + 2 L = Sj)Aij + 2 L (14) 

ij 

where the a matrix is a symmetric matrix with random Gaussian entries, and the A matrix 
is the same as that in Eq. ©• Other than the condition of symmetry, the entries in the 
a matrices are independently drawn from a Gaussian distribution of zero mean and unit 
variance in each matrix and for different i,j. Thus, other than the condition aij = crji, 
the values are independent for different i, j. Si, or Sj. This form is generalized to a p-spin, 
(/-state form as 

f = 'y ^ cr iii2...*p(' s U! s i25 •••> 2T (15) 

where s % can take one of the q values. 
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For the GNK model, as shown in Appendix B, p-spin (/-state result is 


a = 2L(l-l/N)(l-l/q p )/\nq 

or2 

2b = - —(1 - 1/JV)(1 - l//)/ln g — 2 W ,£(1 - l/JV) _ * 


-2 


( 1 ~ 1 /-^ + 9/ A ) P / jV + 2/NK + 2(1 - 1/AT)/1V/ /lug 


g(l - 1/A') + 1/A' 


+4^A(1 --)(! + (A' - 1 )/q p ) /(N In q) 


As with previous models, modularity increases the rate of evolution. 


7 Conclusion 

The modular spin glass model captures several important aspects of biological evolution: 
finite population, rugged fitness landscape, modular correlations in the interactions, and 
horizontal gene transfer. We showed in both numerical simulation and analytical calcula¬ 
tions that a modular landscape allows the system to evolve higher fitness values for times 
t < t*. Using this theory to analyze fitness data extracted from influenza virus evolution, 
we find that approximately 25% of the observed rate of fitness increase of the virus could 
be ascribed to a modular viral landscape. This result is consistent with the success of 
the modular theory of viral immune recognition, termed the p ep itope theory, over the non- 
modular theory, termed p se quence • The former correlates with human influenza vaccine 
effectiveness with R 2 = 0.81, whereas the latter correlates with R 2 = 0.59. The model, in 
general, analytically demonstrates a selective pressure for the prevalence of modularity in 
biology. The present model may be useful for understanding the influence of modularity on 
other evolving biological systems, for example the HIV virus or immune system cells. 
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8 Appendix A: The Requirement that Fitness be Non-Negative 


8.1 The pSK Model 


The normalization of the coupling in Eq. m, the Jij, affects the maximum and minimum 
possible values of the H term. We require that the average energy per site of the se¬ 
quence be finite when L —>• oo, i.e. that H is on the order of L. For the 2-spin, 2-state 
interaction, the Wigner semicircle law |34j shows that a rank-L random symmetric ma¬ 
trix with = 1 V i has a minimum eigenvalue —2. Thus, if we consider the 

Si in Eq. (jT|) to be normalized as Tq sf = L , the 2 L shift in Eq. (pQ) guarantees that 
H + 2L > 0. So after considering the connection matrix A, we choose J n = 0 and 
( JijJkl ) — T $il&jk)/C. 

For the pSK case, we first assume that Jq...q is not symmetric, i.e. Jq...q with permutations 
of the same labels are drawn independently. We will consider the symmetrization afterward. 
We define 

(17) 




2p2-*p^li2-ip 


13 • • -ip 


And for any i\ 


So, 



E< j ; 


>A, 


l\l2 ...Ip! %\%‘^...%p 


= C(J 2 ) 


(18) 


^2 


*1112...Ip 


H ^ ^ Si\Si2‘‘'SipJiii2...ip^i 

i\%2—ip 

— ^ ^ s ii s i2 ( y ^ s i3'" s i p Jiii2—ipAiii2—ip) 


%ii 2 


13...ip 


$ii $12 -^1112 


Ki\i2 K l2%1 


— Si.-\ Si 


(19) 


In the last step we symmetrize the K matrix so that we can apply the Wigner semicircle 
law. From the semicircle law, if we want the minimum of the p-spin interaction to be —2L, 
we need (AT *2 + A: 2 u)/2) 2 = 1 for every i\. We, thus, find 

,Khi, +K,._ ( (K _ + Kr , , = 


E< ; 


iE( k E + a 'S») = ?T) = i 


( 20 ) 


12 


*2 


So (Jj 2 j ) = 2 jC for asymmetric Jq...q. We symmetrize the Jq...j 


V • 

u lll2...lp 


I ^2 1*2...ip 


( 21 ) 


permutations 
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The Wigner approach to calculate the minimal possible value of H is overly conservative, 
because in this approach all possible real vectors, s, are considered, whereas in our appli¬ 
cation Si = ±1. We here use extreme value theory to take this constraint into account. 
We use the 2-spin interaction to exemplify the method. As the J matrix is symmetric, 
its elements are not independent, so the terms in H are not independent. We rewrite H, 
noting the diagonal terms are zero, as 

H = y^ JijSiSjAij = 2 JijSiSjAij = 2 H' (23) 

ij i<j 

The terms in H' are independent. After taking into account the connection matrix, there 
are LC /2 non-zero independent elements. As Si is either +1 or —1, multiplying them 
does not change the distribution of each element, so the H' is the sum of LC /2 Gaussian 
variables. We choose the variance of each element as 1/C, so H' ~ X(0, L/2), and H is 
twice of this, so 

H ~ N(0,2L) (24) 


There are 2 L different sequences, so there are 2 L different H, which differ by a few terms 
instead of all terms, so they are not independent. We seek the smallest H. Slepian’s 
lemma [iisl ] shows that for Gaussian variables A/ and 1/ with average 0, if (A/) = (Y) 2 ), 
and ( X-iXj) < (YiYj) for all i and j, then for any x, 

P( max X; < x) < P( max Y) < x) (25) 

l<i<n l<i<n 


Thus, since X* and Y) have zero mean, the minimum of the less correlated variables is 
smaller than that of the more correlated ones. Thus, we still use extreme value theory 
to determine the minimum for the same number of independent variables with the same 
distribution, and we use this number as an estimate of a lower bound. When we add this 
estimated value to H, Slepian’s lemma guarantees the non-negativity of the fitness. We 
will show shortly that this estimated value is quite near the true value. The extreme value 
theory calculation proceeds as follows: 


cH u 


P(H)dH = 1/2 1 


(26) 


where 2 L is the sample space size when the spin has 2 directions. From the symmetry of 


Gaussian distribution and H max = we find 


1 r 

V2^V2L J- 


^ rH max /2\/L 


e~ x /4L dx = 0.5 H—7= / e~ x ‘dx 


L 


'n Jo 

= 0.5 + 0.5erf(7f max /2\/Z) 

e -^max/4 L 


1 - 


\7fi"77 m ax/2\/T 


= 

where erf() is the error function. Since L is large, we find 

g~ ^max/4i 


-H 2 /4L -H 2 . I4L o-L 

g max/ zzr g min/ = ^ 


So 


\/^^max / 2 'n/T 

F min = -2L\/ln2 « -2 x 0.832L 


(27) 

(28) 
(29) 


Numerical results show that the exact value is —2 x 0.763L [36|, quite close to the value 
in Eq. (12911 . In this way, for the pSK interaction, a normalization of 


= 2 /Cp\ 


(30) 


gives a minimum > —2Ly/\n 2. This bound becomes exact as p —» oo 37]. As p becomes 
bigger, the landscape becomes more rugged, and the sequences becomes more uncorrelated. 
The correlation of general p falls between that of p = 2 and p —>• oo, so according to Slepian’s 
lemma, the exact minimum of any p will fall between that of p = 2 and that of p —>• oo. 
As 0.832 ~ 1 and 0.763 ~ 1, we will neglect this factor, and set the shift to 2 L when 
(J 2 ) = 2/plC, which guarantees / > 0. 

Here we also calculate (H 2 ), which is used to obtain the Taylor expansion results in section 
[9l For a two-spin interaction we find 


< ff2 > = (EE 


ij kl 


^ ^ ^ $il$jk/C)/^ij/S.klSiSjSkSi 

ij kl 

2^ A ij/C = 2 L 


(31) 


We similarly calculate p-spin results. We find that (H 2 ) is always 2 L for all p under this 
normalization scheme. 
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8.2 The Planar Potts Model 


For the planar Potts model, we consider that q is even, so the configuration space of q = 2 
is a subset of that of q > 2, and the minimum of q > 2 is no bigger than that of q = 2 
Potts model, which is —0.763 x 2 L [i^j. We consider two limiting cases to obtain the lower 
bound of the minimum. First, when q —>• oo, the planar Potts model becomes the XY 
model, which is defined in this case as H = JT • Jij s i ' s j = Yhij Jij cos @ij > where 0 t j is 
the angle between vector Sj and Sj, and these vectors s, can point to any direction in a 
two dimensional space. So the configuration space of the planar Potts model is a subset of 
that of the XY model, and the minimum of planar Potts model is no smaller than that of 
the XY model. Numerical results show that the ground state energy for the XY model is 
roughly 0.90 x 2 L Q. 

Second, the vectors in the XY model are restricted in a two dimensional space. If we 
generalize it to an n dimensional space, with S{ = (x \, x' 2 , ...,x n ), and x 2 + x% +... + £ 2 = 1, 
we obtain the n vector model. When n —>• oo, the model becomes the spherical model, 
the exact minimum of H can be calculated analytically as —2 L when (J 2 j) = 1/L (39| . 
Similarly, we see that the configuration space of the XY model is a subset of the spherical 
model, and the minimum of the XY model is no smaller than the spherical model. So 
considering the above two limiting cases, the minimum of the planar Potts model is no 
smaller than —2 L when (J 2 j) = 1/L. Thus, we set the shift as 2 L to guarantee that 

/> o. 

We still need to consider the effect of the connection matrix. We use extreme value theory. 
We consider that the matrix elements of J are randomly chosen, so that H becomes sum 
of LC Gaussian-like variables instead of L 2 variables, so the minimum is y/C/L times that 
of the minimum without the connection matrix. To normalize the minimum back to —2L, 
we finally choose (Jfj) = 1/C. 

We calculate (H 2 ) for the planar Potts model. When q = 2, the average value of cos 2 dij 
is 1, while when q > 2, the value is 1/2. Thus, 


(H 2 ) 



2<4>(1 + <$, )2 )A w 

ij 


L(1 + Sq : 2 ) 


(32) 
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8.3 The GNK Model 


We use extreme value theory to discuss the minimum and normalization of the GNK model. 
When (a 2 ) = 2p\/C , 

H ~ p\N(0, (a 2 )LC/pl) ~ JV(0, 2L) (33) 

so the distribution of the H for the GNK model is the same as that of the pSK model. The 
correlation induced by the dependence of H on the s t is, however, different in these two 
models. Here we follow the method of jiff ]. We have two sets of variables with the same set 
size, one is the p spin interaction GNK H, denoted as Xj, and normalized so that (X 2 ) = 1, 
and the other is a set of Gaussian variables Yj, and (Y 2 ) = 1, ( YjYj) = c < 1. There are q L 
variables in each set, and we know that (Tnax) is \/l — c times the maximum of the_same 
number of uncorrelated Gaussian variables with the same variance and average H , so 
from Eq. (f29l) (Tn ax ) = \/2L(l — c) In q. We define a new variable, Zj = \J 1 — tX t + \ZtYj, 
where 0 < t < 1, and we define r(t) = P(Zi(t) < a ,..., Z q L(t) < a). So r{t = 0) is the 
probability that the maximum of Xi is smaller than a, while r(t = 1 ) is the probability that 
the maximum of Yj is smaller than a. We seek to make r(0) < r( 1), so that X max > h nax , 
so we seek dr{t)/dt > 0 for 0 < t < 1. It is shown that on page 71 of [t4^]: 

dr(t)/dt = - 2 - (XiXj)) d2 *%l]g z f qL) dZi...dZ qL (34) 

IJ 

where <f> is the joint distribution of Z. For example, the term corresponding to i = 
1 , j = 2 is ((YY 2 ) - (X^))/^- • ■f° oo <l>(t,a,a,Z 3 ,...,Z q L)dZ 3 ...dZ q L, and • • 
' f-oo a , a, Z 3 ,..., Z q L )dZ^...dZ q L « cj){t,Zi = a, Z 2 = a), as the probability that all 
other variables is smaller than the maximum we are looking for is approximately 1 . (YjYj) = 
c for all i 7 ^ j, and (XjXj) is the same for pairs with djj the same, so we group the pairs 
according to their Hamming distance, and rewrite Eq. (1341) as 


dr(t)/dt = j (^) ( q ~ 1 ) d ( c “ ( x i x j))(Kt, z i = a , Zj = a) (35) 

i d= 1 ^ ' 

where Xj is any sequence satisfying djj = d. As the system is totally random, it does not 
matter what sequence j is, and we can set it as the first sequence and rewrite it as 

dr(t)/dt = jq L ^ (q - 1 ) d {c - (XiXj))(f)(t, Zi = a, Zj = a) (36) 


where Xj is any sequence satisfying du = d. As only the integral depends on t, we can 
write it as 


r(!) - r(0) = jq L ^ (q - 1 ) d (c - (X\X 1 )) J </>(«, Z x = a, = a) (37) 
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when r(0) = r(l), we find 


Eli © (g - l) d Jp ^1 = q, = a)dt(XiXi) 
Eli ffi (<? - !) d fo z i = a >= a )^ 


We first calculate (. X^Xj). The correlation is 


= (P ! ) 2 X] °'( s ii- a <p) ( 7 ( s i 1 - a <p) A i 1 ...<p 
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We initially neglect the A, as among the different kinds of < 7 , there are ( L p remaining 

unchanged, ( XiXj ) = {Xf)( L ~ d )/(^) = ( L ~ d ) / Q if d < L - p, otherwise it is 0. Now 
considering the connection matrix, if M = 0, the connections are randomly chosen, so we 
expect that the result does not change. If M = 1, all connections fall into small modules, 
and again the d changed spins are randomly distributed in each module, so out of the l 
spins in each module, dl/L spins have changed. Thus, in each module, out of the (^) spins, 
{ l ~f L ) remain unchanged, so (X^j) = (X?)( l ~f L )/Q « ( l-dl/Lf/F « (. L-df/D p « 

(' L ~ d ) / (p). For general M, for a given < 7 , consider that h spins fall out of the module, and p— 
h will fall in the module. Among the ( L h l ) possibilities outside of the module, 
remain unchanged, so the ratio of unchanged over all is is (( i- 0(^- d )/ L )/(^-*) ~ (L — 
d) h /L h . For the { p l _ h ) possible choices inside the module, is unchanged, so the 

ratio is / ( p -jf ~ {L — d) p ~ h /L p ~ h . So the probability that a a is unchanged 

is (L - d) h /L h x (L - d) p ~ h /L p ~ h = (L- d) p /L p « ( X “ d )/(£). So for all M, (X;Xj> = 

<*mV)/© = (V)/©- Also ’ ©(V)/© = (V)> so 

c = E©r (V) (? - i) d £ m gi= «.^ =^ (40) 

Ed=i © (9 - !) d fo = a, Zi = a)dt 

The integral part simplifies as 

/ a 

I ^2, ••••, Zi— h a, ZgL )d^2 • • 'dZi—\dZiJ r \...dZgL Z\ — a, Z^ — u) 

-00 J —00 

We write the joint distribution of the Gaussian variables as 


f{x 1 , ...,x fc ) = 


_ e -i(Z-(z)) r E-i(z-<z)) 
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where E is the covariance matrix, and |E| is the determinant of E. For our particular case, 
where there are only two variables with unit variance and zero average, the covariance 
matrix is given by 

((zf) (^ 2 »\ _ n p\ 

- \{Z X Z 2 ) (Z|> ) © \) 

where p = {Z\Z 2 ). So |E| = 1 — p 2 and 



E " 1 = 


1 

1 - p- 




(43) 


It follows that 


-- 2 (Z-(Z)) t ^-\Z-(Z)) = 


Z\ + — 2pZ\Z 2 

2-2 p 2 


(44) 


2 

When both variables equal a, it is — So putting everything into Eq. (SB, the probabil¬ 
ity that both variables are a is e _ “ 2 ^ 1+p '> / y/\ — p 2 ir. We calculate Pij(t) = [(l — t)(X l Xj) + 

HYiYjM 1 -1 +1) = tc + (1 - 1 ) (V)/© = tc + (1 - f)(i - {y. 

Note that a = y^2L(l — c) In q, so Eq. gOD is a self-consistent equation 


Ed=i { L 7)(Q ~ M fo e~ 2L ^~ c ^ ln '?/[ 1+tc+ ( 1-t )( 1 -r) p ]/©I - [tc+ (1 -t)(l - ±Y] 2 dt 

Ei, ©(? -1 ) d f 0 ‘ e- 2I(1 - c) '", / li+«+c-‘)(i-f)'l/^/i - [tc + (i - t )(i - iypn 

(45) 

The solution for L = 100, p = 2, and g = 2 is c = 0.331, so y/1 — c = 0.82. With p = 2, 
q = 2, and L = 1000, c = 0.330, the value of c near that for infinite L. We also applied this 
method to the pSK model. For p = 2, q = 2, the correlation is 1 — 4 d(L — d)/L 2 . So 

Y7d= it 1 - 4d(L - d)/L 2 ] Jo 1 e -2i(l-c)l n g/[i+te+(i- t )(l-fF]/^/ 1 _ [ te+ (! _ t )(! _ d)P]2 di 

Eti © fo e- 2i ( 1_c ) lni J/[ 1+te +(i- t )(i-r) p ]/©1 - [tc + (1 - t)(l - |) P N* 

(46) 

When p = 2, q = 2, and L = 100 we find c = 0.128, and H m \„ = 0.777 x 2L, quite 
near the numerical result /7 m ; n = 0.763 x 2L. This self-consistent method, thus, is fairly 
accurate. 

For a given q, larger p lead to smaller c using Slepian’s lemma Eq. (1251) . To prove this, first 
assume V t = H^(S t ) and W, = H^(S t ) are GNK model variables with py > p\y. For 
any sequence Si, we group all other sequences according to the Hamming distance between 
them, so the group with Hamming distance d contains ©(<7 — l) d variables. Assume Sj 
has a Hamming distance d from S i} so (VjVj) = ©©)/©) = (©©/©> ( w i w j) = 
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Table 1: Self-consistent calculation of c as a function of q using Eq. (14511 . Here L = 100 


anc 

q 

lp = 2. 

3 4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

c 

0.219 0.17 

0.15 

0.129 

0.116 

0.107 

0.100 

0.094 

0.089 

0.085 

0.082 

0.079 

0.076 

0.074 

0.072 

0.070 

0.068 

0.067 


Table 2: Self-consistent calculation of c as function of q using Eq. (1451) . Here L = 1000000 
and p = 2. 


q 

40 

80 

160 

320 

640 

1280 

2560 

5120 

10240 

c 

0.049 

0.038 

0.031 

0.026 

0.022 

0.019 

0.016 

0.014 

0.013 


= ( i_ D/(S)- As PV > W, (‘70 < ( L J r ), » TO) < (WiWj). As the 
Si is chosen randomly, the correlation between any pair of V is smaller than that of the 
corresponding pair of W. and according to Slepian’s lemma, the minimum of V is smaller. 
Thus, it is proved that larger p lead to smaller minima. 

We calculate the results for p = 2 of different q. Shown in Table |T] is the result of L = 100, 
p = 2 and 3 < q < 20. We see that c decreases monotonically for increasing q in this range. 
We also calculated c for large q and L = 1000000, Table [2j 

It is apparent that all results for c fall below 0.33, so the minimum of GNK model will 
fall between 0.82 x \J2Llnq and y/2Llnq for any p and q. As 0.82 ~ 1, we neglect this 
factor, i.e. conservatively assure that the fitness is positive, not merely non-negative. When 
( H 2 ) = 2L/\nq, the minimum will be 2 L. 

In summary, for all models discussed in this appendix, a shift of 2 L assures that the fitness 
is positive. 


9 Appendix B: Calculation of Taylor Series Expansion for 
Average Fitness 

We here describe how the coefficients in the Taylor series expansion of ( f(t )} are calcu¬ 
lated. 

9.1 Organizing the Terms in the Taylor Series Expansion 

We define f({n a }) = Yin f(Si) n i/N, the average fitness of an individual of configuration 
{n a }. The first order result of Eq. Q is a linear combination of P({n a },t), which can be 
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divided into three parts: / part, from natural selection; /j part, from mutation; v part 
from horizontal gene transfer. In a compact form 


— = CP = C f P + C ll P + C v P (47) 

and 

^ = £/(K})cr({n a }, f ) (48) 

{«4 

In this way we obtain the form of higher order results, for example, the second order result 
can be expressed as 

JP = = X] f({ n a})£~P = X] /({ n “})(^/ + + ^v){Cf + C-n + £z/)-P (49) 

{n 0 } {n a } 


which is divided into 9 terms: 


^ 2 (/) 

dt 2 


/({ n a})(£/£/ + CfC^ + CjC v + C^Cf 

{n a } 

+ c^c v + c v c s + c v c„ + CMP 


(50) 


Note that £ /t , £„ are not commutative with each other (but of course commutative 
with itself), for example CfC^ / C^Cf, since these two correspond to different evolutionary 
processes of the population. As 2b = - |t=o, we can divide 2b into nine terms, and 

name them Tff, Tf M , T/„, T /t /, T ]L]l , T lxv , T„/, T, y/i , T w . Also note that the naming is in 
a sense inverted, for example Tf^ term, which comes from CfC^P, actually represents the 
process that a mutation happens before natural selection. 

In this way, the third order term consists of 27 terms, and the r-tli order term consists 3 r 
terms in general. Each term can be named according to the order of C operators involved. 
Below we will discuss how to compute each part of an r-th order term. First we discuss 
eliminating terms that do not contribute, then we calculate terms for different models. 
We will start with the simplest 2-spin, 2-state model, and then discuss pSK model, planar 
Potts model, and GNK model. 


9.2 Identification of the Terms that Vanish 

The number of terms increases exponentially with order, so we would like to eliminate 
some terms that vanish from some considerations. We discuss this according to the initial 
conditions, that is, random initial sequences and identical initial sequences. 
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For both initial conditions, a term that does not contain natural selection processes van¬ 
ishes. The reason is that the Taylor expansion results correspond to change of fitness, so 
it has a factor of A/ proportional to some linear combinations of H, which is in turn some 
linear combinations of J or a. For terms that only contain mutations and horizontal gene 
transfers, the result is proportional to linear combinations of J or a, which vanish when 
we take an average. Only when there is natural selection can J or a be multiplied into 
second order or higher even order forms, which are non-zero when averaged. So only terms 
containing natural selections can contribute. 

For the random initial sequences, the modular term appears in second order. From the 
principle above, the first order terms T /t and T Ul and the second order terms , T^ u , T u/J/ 
and T uu are all 0 . In addition, for the Tf^ term, first an individual mutates then a natural 
selection happens. As the initial system is totally random, the mutation in the first step 
does not change the system, so this term is 0 . Then, up to second order, there are five 
non-zero terms: Tf, Tff, Tf u , T^f and T u f. 

For the same initial sequences, the modular terms appear in the fourth order, meaning that 
we need to calculate 3 + 9 + 27 + 81 = 120 terms. Fortunately, the initial conditions are so 
special that we can greatly reduce our burden. As discussed above, terms not containing 
/ are automatically zero, so 120 — 2 — 4 — 8 — 16 = 90 terms can be non-zero. In addition, 
if a natural selection or horizontal gene transfer process happens first without a mutation, 
then nothing is changed as every sequence is initially the same, making this term zero. So 
any term not ending with [i is zero, leaving behind 0 + 1 + 5 +19 = 25 non-zero terms: 0 
first order term, 1 second order term T/„,5 third order terms satisfying three-letter-array 
ending with fi and containing a / in the first two letters and 19 fourth order terms satisfying 
four-letter-array ending with /x and containing a / in the first three letters. Additionally, 
for a term starting with v (the last process is horizontal gene transfer), if there were fewer 
than 2 mutational processes before horizontal gene transfer, then the term is zero. The 
reason is that if only one mutation happened, all mutated individuals are the same, so a 
horizontal gene transfer changes the population only when it involves a mutated individual 
and an unmutated individual. And for a change to occur, the horizontal gene transfer must 
transfer the part of the sequence that is mutated. The probability of the unmutated one 
becoming a mutated one through horizontal gene transfer is the same as the mutated one 
becoming an unmutated one, and the change of fitness of these two processes cancel, so 
the contribution is zero. In this way, T u f^ is zero, and T u ffT v f vM and T uv f^ is zero. An 
additional two terms, T^yf^ and are found to be zero after calculation, so there is 

one non-zero second order term, four non-zero third order terms, and fourteen non-zero 
fourth order terms, totaling nineteen non-zero terms. 


26 



9.3 The p = 2 SK Model with Random Initial Sequences 


To illustrate the calculation for the random initial sequences, we use Tf term as an exam¬ 
ple. 


So 


£ f \ni,...,n 2 L) = ^2^^-f a (\na + l,...,n b -l)-\n a ,...,n b )) 


ab 


(51) 


T f = E/({n4K/^(K},°) 

{n a } 


Y P ({ n 4; °) Y -jrfa [/(K + 1, n b - 1)) - f(\n a ,rib})] 


{n a } 


ab 




JV 


n a rib £ fa fb 


{n a } 


ab 


N N 


I \ 

En a n b f fa — fb 

N Ti 


ab 


N J N 


E rigTib n H a — H b 


ab 


N 


E n a rib tt H a — H b 

~ir Ha 


ab 


N 


(52) 


The last equation holds because average over first order terms of H is always zero. As 
the initial sequences are totally random, when we pick out a particular individual a, we 
expect that from the view of a, the other N — 1 individuals are totally random, so their H 
is uncorrelated with that of a. As (H) = 0, Yh b n b^aHb = 0. 
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So 


and, 

Similarly, we can obtain, 


TT H a — Hi, 


ab 


E 

^ ab 


N 


n a n b H 2 
N 2 


N 


N(N-l) 2 
N 2 

(1 1/^0 ^ 'j i'Jij Jkl) ^ij^klSjSjSkSl 

ijkl 


2(1 — l/N) ^ Ajj/C 
b 

2L(1 - 1/JV) 


a = T f 


T ff = -8L 2 (1 -1/N)/N 
T^f = — 8/cL(l — 1/1V) 

Tjv = -4i/L(l - 1/1V) [Af + (1 - M)/K\ /N 
T uf = AvL(l — l/N)(l — 3/JV) (1 — 1/K) (M — 1) 


and 

2b = T ff + t m/ + T > + T *T 


(53) 

(54) 


(55) 

(56) 


9.4 The p = 2 SK model with Identical Initial Sequences 

To illustrate how to calculate the terms for identical initial sequences we use Tf^ term 
as an example. We assume the initial sequences are Sq, and the initial state is {n a }o = 
(IV, 0, ...,0) = |iV5 ei o). £ /( takes the state to 

C^NSe, o) =fiNj2i\(N- l)<y e ,o + Se,a) - \NS e , 0 )] (57) 

a=d 0 
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A subsequent Cf takes the state to 


CfC^NSefi) 


»NCf ^2 [\(N- 1)4,0 + 4,a) - |lV4,o>] 

a=d 0 


^e 


a=d 0 L 


N — 1 


N 


f(S 0 )(\N6 efi )-\(N-l)S e , 0 + S e , a )) 


+ ^J^-f( S aM N - 2)4,0 + 24, a) - I (N- 1)4,0 + 4,a)) 


fl(N-l) J2 [/(5 0 )(|JV4,o) -\(N- 1)4,0 + 4, a)) 

a=d 0 

+f(S a MN - 2)4,0 + 24,a) - I (N- 1)4,0 + 4,a))] (58) 


So after two operations, P(|JV4, 0 ),0) = //(IV-1) ^2 a=d0 /(5 0 ), P(|(AT-l)4,o + 4,o),0) = 
-//(IV - 1) £ a=so (/(5 a ) + f(So )) and P(\(N - 2)4,o + 24,a), 0) = //(IV - 1) £ a=90 /(5 a ), 
and we find 

T flt = ^/(KIK^P ({n a },0) 

{««} 

= /(|iV4, 0 ))P(|iV4,o),o) 

+/(|(A" - 1)4,0 + 4,a)) J P(|(^v - 1)4,0 + 4,a), 0) 

+f(\(N - 2)4,0 + 26 e , a ))P(\(N - 2)4,0 + 24,a), 0) 

= //(l - 1/1V) [Nf(S 0 ) 2 - (IV - l)/(5 0 )(/(5 0 ) + /(^a)) 

a=d 0 

-/(5a)(/(5o) + /(5a)) + (IV - 2)/(5a)/(5o) + 2/(5a) 2 ] 

= //(l - 1/1V) £ (/(5a) - /(5o)) 2 (59) 

a=d0 

in which 

E [/(« -/(So)] 2 = EEE Jij «4/AjjAfc/[l — (1 - 24/)(l - 24/)] 

a=d0 a=d0 /c/ 

x[l-(1-24^(1-240] ( 60 ) 
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if we average over JijJki , we find 


a=d 0 


So 


and 


Y - (1 - 2J 0 j)(l - 2<5 aj -)] 

a=<90 ijkl 

x[l-(l-28 ak ){l-28 al )] 

Y Yx^iktijl + — Sij)AijA k i/C 

a=d 0 ijkl 

X[1 - (1 - 2<5 0 j)(l - 2<J ai )][l - (1 - 2<U)(1 - 2 5 al )] 

2 Y Y Ai ^ - (! - 2 ^)(! - 2 ^-)] 2 /C' 

a=<90 


8 EE A-ij(S a i + 8aj)/C 

a=d 0 


16L 

(61) 

Tfij = 16/xL(l - 1/AO 

(62) 

to 

o- 

II 

00 

■& 

(63) 
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Similarly, we find 


T fw . = — 128/r 2 L(l — 1/N) 

Tfufi = -32/xi/L(l - 1/N)/N 
Tf fl i = — 64pX 2 (l — 1/N)/N 
Tfiffi = -64/z 2 L(1 - 1/N) 

T ffftl = 256/xL 3 (1 — l/N)/N 2 + 128/iL 2 (l — 1/N)/N 2 

+256/iL(l - 1/AQ ^3 - — + 

Tffnn = 512/x 2 L 2 (1 — 1/N)/N 
Tffun = 128//zaL 2 (1 — 1/N)/N 2 
Tfufn = 512/x 2 L 2 (1 — 1/N)/N 
T fmi = 1024p 3 L(l — 1/N) 

Tf^fj, = 256/x 2 z/L(1 - 1/1V)/1V 
Tfvfp = U^uL\l-l/N)/N 2 

Tfvnn = 128// 2 ^L(1 — 1/N) [2 + (1 — M) (1 — 1/K)\ /N 
T fvvii = 6Afiu 2 L(l-l/N)/N 2 
Trf Sv . = 256p 2 L 2 (l — 1/N)/N 
Trfw = 512// 3 L(1 — 1/IV) 

= 128/x 2 zaL( 1 — l/iV’j/iV 
= 256p 3 L(l — 1/IV) 

T„ /wi = -l2^ 2 uL{l-l/N)(l-2>/N)(l-M)(l-l/K) (64) 

and 

(3!)c = Tf^ + Tf„n + Tffn + 

(4!)d = Tfffn + Tffnp + Tffvn + Tfpfu + Tfunp + Tfuvn + Tfvfn 

+ T fu l i l i + Tfuufi + T nffiJ, + T/xfuiJ, + T nfvn + ( 65 ) 

9.5 The pSK Interaction 

For the p-spin interaction, the relationship between different Ji ljmmmi i is quite similar to the 
p = 2 case, and the principles eliminating zero terms are the same. Moreover, the physical 
processes corresponding to each Taylor expansion term remain the same, so we just need 
to change the form of the H. For example, to obtain Tf^ term of same initial conditions, 
Eq. (1591) still holds, but we need to use the new H to calculate ^2 a= Q 0 [f(S a ) — /(So)] 2 - 
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For the new form, 


£ [/«>) - /(So)] 2 

a=d 0 


f f 

a=80ii...ipji...j p 


x[i - n? =1 (i - 2^)][i - u p q=1 (i - 2<y aJ ,)] 


X] F pl Cp\ ( 2 F F, ) A h-* 

a=(90ii...i P ” y <?=1 

^ F F ^h-i P ^2^a,i q 

a=dO i= 1 

- F A n i 

(J / y “U —*p 

2i.. .2p 

8pL 


( 66 ) 


So Tf tl = 8ppL(l — 1/N). In this way, we can obtain for p-spin interaction random initial 
sequences, 

7) = 2L(1 — 1/iV) 

Tff = -8L\l-l/N)/N 
Tfif = -4ppL(l-l/N) 

Tfu = -^L(1-1/N)^M+^J (l-M^/N 

T vf = 2puL(l-l/N)(l-3/N)(l-(^J Vm-1) (67) 


a = T f 

2b = T ff + T vf + T fv + T "f ( 68 ) 
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Also, we find for initial conditions of identical sequences, 


Tf^ = -Mp 2 pL(l - 1/N) 

Tfv/j = -lQpvpL(l - 1/N)/N 

T ff n = -32fipL 2 {l-l/N)/N 

= — 16/j, 2 p 2 L(l - 1/N) 

T fffljl = 128ppL 3 (l - l/N)/N 2 + 64 W L 2 (1 - l/iV)/iV 2 

6App 2 L(l - 1/N) (s - ^ 

Tffw = 128p 2 p 2 L 2 (l-l/N)/N 

T ff vn = MpvpL 2 (l - 1/N)/N 2 

T flxfll = 128p 2 p 2 L 2 (l — 1/N)/N 

Tfnnn = 128/xVL(l - 1/N) 

T fia ,p = Mp 2 up 2 L(l - 1/N)/N 

T fvf n = 64pvpL 2 (l-l/N)/N 2 

T Svm = Mp 2 vL{l - 1/N) [p 2 +p(p - 1)(1 - M) (1 - 1/ K)\ /N 

Tfuvn = 32fiv 2 pL(l - 1/N)/N 2 

Trffr = 64^VL 2 (1 - 1/N)/N 

T^fnn = Mp 3 p 3 L{l - 1/N) 

Ttfvn = 32p 2 up 2 L(l - 1/N)/N 

= 32p 3 p 3 L(l - 1/N) 

T vf w = -Mp 2 vp(p-l)L(l-l/N)(l-3/N)(l-M)(l-l/K) (69) 


a = T f 

(2 -)b = T ff + T^f + TfyTyf 
(3!)c = Tf^ + Tf Vfl + Tff M + T^f^ 

(4!)d = Tfffp + Tffw + Tff „„ + T f/lffl + + T ffiu/1 + T fvfli 

TTfufifi + T Svvil + Ttffp + Ttfw + T flfufl + T^f^ + T„ fflfl (70) 


9.6 The Planar Potts Model with Random Initial Sequences 

The non-zero terms are still Tf, Tff, T^f, Tf v and T v f. As (H 2 ) is changed, all terms 
will also change. In addition, the T^f term has a mutational process, which depends 
on the number of directions a spin can have. We assume that after a mutation, a spin 
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randomly chooses a different direction as before. Using methods similar to Eq. (1521) . we 
find T fJ f = |u]Ug a (l — l/N) (H a (Hg a — H a )). Considering the different directions, we 
find 


22 ( H a ( H da - H a )) 


da 


^ ^ ^ ^ Jij^ijSiSj COS Oij ^ ^ JkiAfziSkSl (COS COS Ofol) 

da \ ij kl t 

22\22 2-^ A ij cos Oij (cos d\j - cos 0 % j) \ 


da \ ij 


(71) 


where O'- is the same as 0^ if neither i nor j is mutated, otherwise O'- has the same 
probability to be any allowed value other than 0ij. So we only need to consider the case 
when either i or j is mutated, and obtain an extra factor of 2 from this, 

1 


22 { 22 2 c^ ij cos %( cos 0'ij ~ cos Oij) 


da \ ij 


£ \ 22 A b' cos 0 b( COS Oij ~ cos Oij) 


(72) 


and 


Then, 


(cos 0^ cos 0[j) = 


q- 1 


22 COS Or COS 0 S 


r^=s 


1 


K q~ 1 
1 

q-1 


( ^^ cos 0 r — cos 0 S ) cos 0 S 

r 

(cos 2 0 S ) 


(73) 


( H a (H da - H a )) = \g22 A b' cos 0 b( cos O'ij - cos %) 


*7 


4 

C 


Z! A b (-^y - l)(cos 2 0) 


= -2L- 


—4 L —-—— (cos 2 0) 


y(l + ^, 2 ) 


(74) 
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The other terms are 


T, f = -2 M L(1 - VN)-^-(l + 5 qa ) 


a = T f = L(l-l/N)(l+S q , 2 ) 


= —4(1 + 5 q ^)L 2 (l — 1/N)/N 

= —2(1 + <5 g 2 )/xL(1 — l/N )— 

q — 1 

= -2(1 + S q , 2 )vL(l - l/N) [M + (1 - M)/K]/N 
= 2(1 + 8 q , 2 )vL{l - 1/N)(1 - 3/!V)(l - 1/K)(M - 1) 

26 = T ff + T m/ + T N + T vf 


9.7 GNK Model Calculation 


The processes corresponding to different terms are still the same, but for the GNK model, 
as the H is quite different, the Taylor expansion results will change. For example, for two 
randomly chosen sequences S a and Sf ,, their correlation will be non-zero. Instead, it will 
be 

(H a Hb) = / a ji---jp( s i 1 ■■■ s i p )^h-jp / 

= (p!) 2 E 


h<...<i p 

= (P ! ) 2 I] ^(<7 2 )&h...i 
= p ] E ^(^ 2 ) A u...i P 


%\...%p 

1 2 


= p\--^r-LC 

q p p\C in q 

2 L , , 

= —;— (79) 

qP In q 

where the 1 /q p factor comes from the fact that only when the states of all corresponding 
spins match can the correlations of a be nonzero. 
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Similarly we calculate all terms and obtain 


a = T f = 2L(1 - 1/N)(1 - l/q p )/lnq 


(80) 


and 

2b 


T ff = ~ 


T»f 

Tf, 

T,f 


T ff + T /U + T f, + T,f 

8L 2 (1 - 1/JV) 


(1 - 1 If) 


Nlnq 
—2fipL(l — l/N)/lnq 
AuK(l - 1/N) 


Nlnq 


L( 1 - M) + Q/ h y + LM / K + L ( X _ i/K)M/q p 


2 vK 


In q 


(M — 1)(1 — l/iV)(l — 3/N) 


1 + l/q p - 


1 - 1/K + q/K 


q-q/K+l/K V^ 
Q ) 


(81) 


10 Appendix C: Calculation of the Taylor Series Expansion 
for the Sequence Divergence 


We here describe how the Taylor expansion series of sequence divergence Eq. 0 are cal¬ 
culated. 

As the divergence is D = YlH= l {^ L ~ So ‘^' Sa ^ ) : it is determined by the changes to the 
sequences of the population, which can be tracked using Eq. ©• The order of the terms 
corresponds to the number of processed involved, for example, second order terms involve 
two processes. Using conventions developed in section 19.11 we divide the terms according 
to what evolutionary processes it involves, for example, the term which is the result of 
a mutational process followed by a horizontal gene transfer is called D Ufl , similar to the 
naming norm used in section 19.11 So 

a = D j + D^ + D u 

(2!)/3 = Dff + Dfn + Df u + D^f + + D^ v + D„j + D Ufl + D ul/ (82) 

We use Df term as an example. From Eq. (1511) . after a natural selection process, one 
sequence Sf, is replaced by sequence S a . If it is replaced by itself, nothing is changed. 
Otherwise, as the initial sequences is totally random, the number of sites changed is on 
average L/ 2, and the probability of this is (/(a))( 1 — 1 /N) = 2L(1 — 1/IV). As in the whole 
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population, only this sequence is changed, the divergence can be calculated as 



1 / L - S b (t) ■ S b (0) 


2 


2L(1 - l/N) 
L 2 ( 1 - l/N) 


L - (L/2 - L/2) 
2 


(83) 


Similarly, we can obtain other terms, 


Dfj, = fiL 

D v = uL(l — l/N)/2 

D ff = —2L 3 (1 — l/N) + L 2 (l — 1/7V)(1 — 3/N) 

Dfn = D^f = —2/xL 2 (1 — l/N) 

D fv = D u f = —vL 2 (1 — l/N) 

D IHl = -2 /i 2 L 

D, Jiy = = ~nvL{ 1 - l/N) 

D vv = -v 2 Ul - 1/A0/2 (84) 


Adding these terms together gives Eq. 0). 
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